Skip to content

[codex] Support raw image offload in v1 train client#1746

Draft
eligotts wants to merge 2 commits into
mainfrom
codex/v1-raw-image-offload
Draft

[codex] Support raw image offload in v1 train client#1746
eligotts wants to merge 2 commits into
mainfrom
codex/v1-raw-image-offload

Conversation

@eligotts

@eligotts eligotts commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • tighten v1 multimodal graph serialization around strict raw descriptor sidecars
  • reject processed multimodal payload keys recursively, including nested pixel_values, image_embeds, and image_features
  • update v1 multimodal tests to use strict prime_raw_mm_item envelopes instead of descriptor-only Qwen payloads
  • keep raw image offload and retry behavior aligned with the companion Renderers and Prime-RL PRs

Companion PRs

Notes

  • Draft/WIP: this depends on the renderer generic raw multimodal ref contract in the companion PR.
  • v1 multimodal sidecars intentionally carry raw descriptors only, not processed image tensors or image-processor payloads.
  • Prime/vLLM materialization happens from raw image refs rather than Verifiers-held processor outputs.

Validation

  • Commit hooks: ruff check, ruff format, generated AGENTS/CLAUDE check passed.
  • Push hook: ty (ci parity) passed.
  • End-to-end hosted-style smoke through Prime-RL with /home/ubuntu/verifiers, /home/ubuntu/renderers, and /home/ubuntu/prime-rl-v1-raw-mm-offload completed inference, env rollouts, train batch creation, trainer step 0, and decoded strict trainer-bound raw image refs.

Note

Add raw image offload and multimodal bridging to v1 train client

  • Adds offload_images_inplace in utils/multimodal.py to rewrite image_url parts in wire bodies and typed messages to file:// run assets in-place, enforcing that all image URLs resolve to file references.
  • TrainClient now calls image offload in prepare_request_body and prepare_messages, and bridges multimodal turns by forwarding previous_multi_modal_data to the renderer on each turn.
  • Adds _generate_with_image_ref_retry to retry generate once with materialize_all_image_refs=True when vLLM returns a missing_mm_cache_item error and only descriptor-only image refs are present.
  • MessageNode serialization/deserialization now rejects processed image payloads (e.g. pixel_values ndarrays) and enforces raw image descriptors only; PendingTurn gains a previous_multi_modal_data method that merges sidecars across the reusable prefix.
  • Risk: Existing multimodal data containing ndarray payloads will raise TypeError on serialize or deserialize, which is a breaking change for any stored nodes carrying processed image tensors.

Macroscope summarized 6f9c55e. (Automatic summaries will resume when PR exits draft mode or review begins).

@eligotts eligotts force-pushed the codex/v1-raw-image-offload branch from 7556743 to 3f5bb1a Compare June 23, 2026 19:23
@eligotts eligotts changed the base branch from feat/nano-as-v1 to main June 23, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant